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ABSTRACT 


In this study, two supervised machine learning algorithms i.e. supervised 
machine learning regression and supervised machine learning classification 
model have been used for predicting the Ultimate Tensile Strength (MPa) and 
Weld joint efficiency of Friction Stir Welded joint. The results showed that 
Polynomial regression model yields better result than other supervised 
machine learning regression model and Decision Tree (Both Gini Index and 
Information Gain criterion), Artificial Neural Network classification models 
gave better classification result than K-Nearest Neighbor classification model. 
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1. INTRODUCTION 


Machine Learning can be considered as a subset of artificial intelligence where 
available information and data are being learned by some algorithms. The 
history of the development of machine learning dates back to 1950 when the 
Turing test was created by Alan Turing in order to verify the intelligence of 
the computer system [1]. In 1952 the first learning-based computer program 
was developed by Arthur Samuel [2]. The first neural network i.e. the 
perception model was created for computers by Frank Rosenblatt in 1957 with 
the main objective of simulating the thought process of humans. In 1967, basic 
pattern recognition activities for computers were carried out by a designed 
nearest neighbor algorithm. Machine learning algorithms are divided into 
four categories i.e. supervised machine learning algorithm, unsupervised 
machine learning algorithm, semi-supervised machine learning algorithm, 
and reinforcement machine learning algorithm [3-5]. Supervised machine 
learning algorithm finds application in predicting the future, unseen or 
unavailable data based on the available dataset [6-7]. The supervised machine 
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learning algorithm is further subdivided into two types i.e. regression model and classification model [8-9]. 


Supervised machine learning algorithms are finding various applications in the manufacturing and materials industries. 
Supervised machine learning algorithms are being used for predicting the material properties such as fracture strength, tensile 
strength, elongation percentage, and also the hardness of the given material [10-13]. 

The present research work focuses on the implementation of supervised machine learning regression algorithms such as 
polynomial regression, support vector regression, decision tree regression, random forest regression as well as an artificial neural 
network is used for predicting the Ultimate Tensile Strength of friction stir welded similar AA2024-T3 aluminum alloy and 
supervised machine learning classification algorithms such as K-nearest neighbors, decision tree classifier with Gini Index and 
Information Gain as criteria for classifying the joint having weld strength efficiency greater 70 percent of the base metal. 


2. MATERIAL AND METHODS 


Similar AA2024-T3 aluminum alloy plates of dimension 200 X 100 X 3.5mm were mounted on a vertical milling machine in order to 
be butt welded by the Friction Stir Welding process by using a tapered pin profile tool steel (HSS) [14]. The experimental dataset 
consists of 27 data which is fetched from the work of Hussein et al. [14] and is further converted to CSV (Comma Separated Values) 
file for importing purposes during the execution of Python codes. The dataset consists of Tool Shoulder Diameter (mm), Tool 
Rotational Speed (rpm), Tool Traverse Speed (mm/min), Number of weld passes, and Tool Tilt Angle as input parameters while the 
Ultimate Tensile Strength (UTS) is an output parameter. The Python libraries which are imported for constructing and executing the 
Machine Learning algorithms were Numpy, Matplotlib, Seaborn, Pandas, Tensorflow, and Keras. Figure 1 shows the hierarchy of 
the experimental procedure subjected to the CSV dataset. 


Figure 1: Various operations subjected to the imported dataset 


The dataset is subjected to Machine Learning Classification Models. For the Machine Learning classification modeling, the 
output of the given dataset is classified as 0 if the UTS is less than 70% of the reference value and as 1 if UTS is greater than or equal 
to 70 % of the reference value and at last it is subjected to three classifier models i.e. K- Nearest Neighbors, Decision Tree with Gini 
Index and Information Gain as criteria and Neural Network classification model. 


3. RESULTS AND DISCUSSION 


3.1 Exploratory Data Analysis 

At this stage, we have explored the relation the features shared with the target variable. Accordingly, the features were dropped 
which have no relation with the target variable. From Table 1 we can see the data's distribution and judge whether we need to 
normalize our data or not. We also get other statistics using the table. 
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Table 1: Statistical model of the experimental dataset 


Shoulder Diameter (mm) Rotational Speed (rpm) Traverse Speed (mm/min) Tilt Angle Passes UTS (Mpa) 


count 27.000000 27.000000 27.000000 27.00000 27.00000 27.000000 
mean 14.000000 1033.333333 58.666667 2.00000 2.00000 249.666667 
std 1.664101 292.206460 16.751579 0.83205 0.83205  69.089685 
min 12.000000 700.000000 40.000000 1.00000 1.00000 95.000000 
25% 12.000000 700.000000 40.000000 1.00000 1.00000 217.500000 
50% 14.000000 1000.000000 56.000000 2.00000 2.00000 245.000000 
75% 16.000000 1400.000000 0.000000 3.00000 3.00000 304.500000 
max 16.000000 1400.000000 80.000000 3.00000 3.00000 361.000000 
3.2 Checking Null Values 


The check_null() function is used to check the number of null values in the dataset. The null values are replaced by mean. 


3.3 Plotting Graph of p-Value Function and Contour Plot 

The plot_graph_pvalue function plots a line plot between given variables and prints the p-value and Pearson values. The 
contour_plot function plots a contour plot for the given variables. Figure 2 shows the plot against the shoulder diameter (mm) and 
UTS (MPa). The obtained p-value and Pearson value for the given parameters is 0.50151 and 0.135 respectively. 


320 
300 
280 
260 


240 


UTS (Mpa) 


220 
200 


180 


RO 2S BO BS 40 145 150 155 160 
Shoulder Diameter (mm) 


Figure 2. Relationship between UTS (MPa) and Shoulder Diameter (mm) 
From the p-value and Pearson value, we can clearly interpret that shoulder diameter is highly correlated with UTS. From the 


graph, we can see the Pearson's predictions come to life as we see the UTS values start to drop with increasing shoulder diameter 
after 14 mm. Figure 3 shows the contour plot of shoulder diameter and UTS. 
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Figure 3. Contour Plot between UTS (MPa) and Shoulder diameter (mm) 
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Figure 4 shows the plot against the Tool Rotational Speed (rpm) and UTS (MPa). The obtained p-value and Pearson value for the 
given parameters is 0.00034 and 0.638 respectively. Figure 5 shows the contour plot between Tool Rotational Speed (rpm) and UTS 
(MPa). 
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Figure 4. Relationship between UTS (MPa) and Tool Rotational Speed (rpm) 
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Figure 5. Contour Plot between UTS (MPa) and Tool Rotational Speed (rpm) 
From the graph depicted in Figure 4 it is observed that the UTS increases after tool rotational speed of 1000 rpm. Figure 6 shows 


the plot against the Tool Travel Speed (mm/min) and UTS (MPa). The obtained p-value and Pearson value for the given parameters 
is 0.00045 and 0.628 respectively. Figure 7 shows the contour plot between Tool Traverse Speed (mm/min) and UTS (MPa). 
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Figure 6. Relationship between UTS (MPa) and Tool Traverse Speed (mm/min) 
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Figure 7. Contour Plot between UTS (MPa) and Tool Traverse Speed (mm/min) 
From the graph shown in Figure 6 it is observed that the UTS continue to increase after traverse speed greater than 55 mm/min. 


Figure 8 shows the plot against the Tilt Angle and UTS (MPa). The obtained p-value and Pearson value for the given parameters is 
0.34943 and 0.187 respectively. Figure 9 shows the contour plot between Tilt Angle and UTS (MPa). 
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Figure 8. Relationship between UTS (MPa) and Tool Tilt Angle 
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Figure 9. Contour Plot between UTS (MPa) and Tool Tilt Angle 


From the graph shown in Figure 8 it is observed that UTS continues to decrease with increase in Tool Tile angle upto 2 degree 
but after 2 degree, UTS starts increasing. Figure 10 shows the plot against the number of weld passes and UTS (MPa). The obtained 
p-value and Pearson value for the given parameters is 0.63737 and 0.095 respectively. Figure 11 shows the contour plot between 
number of weld passes and UTS (MPa). 
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Figure 10. Relationship between UTS (MPa) and Number of passes 
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Figure 11. Contour Plot between UTS (MPa) and Number of passes 


Figure 12 shows the correlation heat map. 
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Figure 12. Correlation Heat map 


Figure 13 shows the correlation plot with response variable using bar graph. From the plot analysis, we conclude that all input 
parameters have some relation with UTS. So, we will not be dropping them from the dataset. 
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Figure 13. Response variable plot 
3.4 Implementation of Supervised Machine Learning Regression Models 


Firstly, the dataset is split after that we decide whether to normalize or not. From the statistics shown in the Table 2, we will decide 
whether or not to normalize the dataset. 
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Table 2. Statistical Data Analysis 


Shoulder Diameter (mm) Rotational Speed (rpm) Traverse Speed (mm/min) Tilt Angle Passes UTS (Mpa) 


count 27.000000 27.000000 27.000000 27.00000 27.00000 27.000000 
mean 14.000000 1033.333333 58.666667 2.00000 2.00000 249.666667 
std 1.664101 292.206460 16.751579 0.83205 0.83205  69.089685 
min 12.000000 700.000000 40.000000 1.00000 1.00000 95.000000 
25% 12.000000 700.000000 40.000000 1.00000 1.00000 217.500000 
50% 14.000000 1000.000000 56.000000 2.00000 2.00000 245.000000 
75% 16.000000 1400.000000 0.000000 3.00000 3.00000 304.500000 
max 16.000000 1400.000000 80.000000 3.00000 3.00000 361.000000 


Since the range of Travel speed is larger than that of other variables, we will be normalizing the data. 

The get_score_regression function will give Mean Absolute Error, Mean Squared Error, and R squared error to analyze the 
model's performance. The Polynomial regression, Decision Tree regression, Random Forest regression, Support Vector Regression 
and Neural Network regression models were implemented and the one which performs the best was selected. Table 3 shows the 
model analysis of the regression models implemented on the dataset. 


Table 3. Mean Absolute Error, Mean Square Error and Coefficient of determination 


Model Mean Absolute Error MSE R2 
0 Polynomial Regression 0.029136 0.001868 0.998428 
1 SVR 0.453090 0.554425 0.533566 
2 Decision Tree Regressor 0.129469 0.019894 0.983263 
3 Random Forest Regressor 0.293354 0.146900 0.876414 
4 DNN 1.082620 2.339795 -2.860712 


Figure 14 shows the model performance of the Neural Network Regressor model. 
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Figure 14. Variation of loss function with number of epochs 
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Figure 15 and Figure 16 shows the model performance in terms of Mean Square Error (MSE) and Mean Absolute Error (MAE). 
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Figure 16. Representation of Mean Absolute Error of each Machine Learning Regression Models 


From Figure 15 and 16 it can be clearly interpreted that the Polynomial Regression model and Decision Tree are a better fit than 


other models while on the basis of Mean absolute error, we can see that the Polynomial Regression model outperforms all other 
models. 


3.5 Implementation of Supervised Machine Learning Classification Models 
K-nearest Neighbors, Decision Tree Classifier (with Gini Index and Information Gain as Criterions), and Neural Network 
classification models were subjected to the dataset. The accuracy of the models for classification is shown in the Table 4. It is 


observed that Artificial Neural Network, Decision Tree classifier (Gini Index) and Decision tree classifier (Information gain as 
criterion) results better accuracy. 
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Table 4. F1 Accuracy of Machine Learning Classification models 


KNN 0.89 

Decision Tree Classifier 1.00 

Decision Tree Classifer(Information Gain) 1.00 
ANN 1.00 


Figure 17 shows the performance of training and testing set while implementing Neural Network classification model. 


model loss 


—— tain 
= validation 


0.14 


0.12 


0.10 


0.08 


Loss 


0.06 
0.04 


0.02 


na ey FAW AEM BY 


0 100 200 300 400 500 
Epochs 


Figure 17. Training and testing set model performance 


Figure 18 shows the plot of Fl-accuracy of each classification models. 
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Figure 18. Fl-accuracy of Machine learning classification models 
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4. CONCLUSION 


From the obtained results, we can see that Decision Tree Regressor and Polynomial Regressor outperform all other regressor 


models. As a classifier, Artificial Neural Network, Decision Tree (Gini Index) and Decision Tree (Information Gain) outperform the 


K-Nearest Neighbor model. In both the case, the Deep Learning model could have given a much better performance had there not 


been lack of data. 
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